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Abstract 

To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network 
in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes 
extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of 
gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify 
significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly 
different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network 
was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of 
other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, 
and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. 
Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of 
cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different 
motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including 
EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed 
by inhibiting signal transductions upstream and downstream of the selected motifs. 
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Introduction 

Numerous studies have shown that the abnormal 
transduction of cellular signaling is closely related to 
differentiation, apoptosis, and proliferation of cells, and to 
the occurrence, progression, and prognosis of disease 
(1). According to studies of intercellular protein-protein 
interaction networks, the regulation of local signaling in 
normal tissue is different from that in tumors (2). Network 
motifs are the specific combinations of functional vertices 
and the basic building blocks of a network. Motifs can 
react to external stimuli by regulating gene expression. 
Mining the cancer susceptibility genes, combined network 
motifs, and gene expression profiles (3) can improve the 
identification of target genes on tumor metastasis 
markedly (4,5). 

About 90% of early gastric cancer patients with 
adequate treatment can survive for more than 5 years 
and be considered cured; however, the 5-year survival 
rate of advanced gastric cancer after treatment is less 



than 5% (6). Thus, early diagnosis is the key to improving 
treatment efficacy and increasing survival rate (7). 

In this study, in order to screen for gastric cancer- 
related genes and then investigate the signal-regulating 
models, we constructed a human signaling network after 
integrating information from many databases and refer- 
ences. After analysis of topological properties, we 
mapped the verified genes onto the network, and mined 
the cancer-related motifs using three vertices. Finally, we 
selected the motifs that were significantly different in 
normal compared with gastric cancer cells. Genes in the 
significantly different motifs were the screened genes. 

Material and Methods 

Gene expression profiles 

The Gene Expression Omnibus (GEO) database 
(http://www.ncbi.nlm.nih.gov/geo/) is currently the largest 
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fully public gene expression resource. It provides flexible 
mining tools that enable users to easily query, filter, 
inspect and download data within the context of their 
specific interests (8). We downloaded the gene expres- 
sion profile data of GSE2685 (9) from GEO, which was 
based on the GPL80 platform (HU6800; Affymetrix 
Human Full Length HugeneFL Array) data. A total of 30 
samples were available, including primary human 
advanced gastric cancer tissues (n = 22), and noncancer- 
ous gastric tissues (n = 8). We downloaded the raw data 
and the probe annotation files from Affymetrix for further 
analysis. The probe-level data were converted into 
expression values, log 2 transformed, and standardized 
using the median method (10). 

Extraction of gastric cancer-related genes 

Gastric cancer-related genes were extracted from 
CancerResource (11), GeneRIF (Gene Reference into 
Function) (12), and COSMIC (Catalogue of Somatic 
Mutations in Cancer) (13) databases. 

Human signaling network construction 

All cellular activities, including division, differentiation, 
and apoptosis are closely associated with signal trans- 
duction. The BioCarta database is the largest collection of 
information on human signaling pathways. We down- 
loaded all the human signaling pathways from BioCarta 
(http://www.biocarta.com/genes/Cellsignaling.asp) (14), 
removed redundant information, and represented all 
proteins with their corresponding genes. In addition, 10 
cancer-related pathways from Cancer CellMap (15) and 
pathways published by Le and Kwon (16) were also used 
to construct the signaling network associated with gast- 
ric cancer. Gastric cancer-related genes extracted from 
different databases were then marked into the signaling 
network. Finally, the network analyzer tool in CytoScape 
was used to calculate network topological characteristics 
such as degree distribution and clustering coefficient. 

Motif mining in human signaling network 

Many biological networks consist of specific combina- 
tions of subnets with frequencies of occurrence that are 
significantly higher than random. Topological motifs with 
high frequencies can be used to explain the principles of 
bio-network organization (17). The fast network motif 
detection (FANMOD) software (18) was used for motif 
mining in the human signaling network, because it can 
handle networks with colors in nodes and edges, and 
predict the mining time for the whole network with a high 
operating efficiency. 

Screening for significant differences among motifs 

To investigate the differences of motifs in the normal 
and cancer states, the significant motif difference (SMD) 
method (19), based on variations of coexpression, was 
used to calculate the SMD scores of motifs. For a motif 



(M A ) with three edges, E1, E2, and E3, the difference 
score (S) is defined as: 



(1) 



S(M A )=^>bs(E k -E k ), n = 3 

cov(X.Y) 



E k = |Pearson(X,Y)|: 



E k = Pearson ( X ,Y 



\/D(X)\/D(Y) 
covfx'Y 



DfX ) \/DfY 



(2) 



(3) 



where X, Y are the gene expression values in the normal 
state and X', Y' are the gene expression values in the 
cancer state. E k and E' k are the absolute values of Pearson 
correlated coefficients between the two genes connecting 
by edge k under normal and cancer states, respectively. 

Motifs with SMD scores higher than threshold are 
the significantly different motifs, and the threshold is set 
according to the distributions of SMD scores. P = 0.05 
was selected as the significance threshold. 

Functional annotations of significantly different 
motifs 

Gene ontology (GO) functional annotations (20) of 
genes in significantly different motifs were performed 
using the Database for Annotation, Visualization, and 
Integration Discovery (DAVID) (21). Functions with a 
corrected P value false discovery rate (FDR) of less than 
0.05 were selected. 

Results 

Gastric cancer-related genes 

By screening the expression profiles and extracting 
from three databases, 5515 and 778 related genes were 
obtained, respectively. 

Human signaling network construction 

The human signaling network was constructed com- 
bining the pathways obtained from databases and 
references. There were 1634 nodes and 5089 regulating 
interactions, including 2403 activated, 741 inhibited, and 
1915 physical interactions in the network (Figure 1). 

The integrated network was hypothesized to have 
the same characteristics, such as small-world, scale-free, 
and hierarchy as protein-protein interaction networks, and 
gene networks (22). The CytoScape NetworkAnalyzer 
was used to calculate the degree distribution (Figure 2A) 
and clustering coefficient (Figure 2B) of the network. It 
turned out than the degree distribution followed a power 
law, and the network had scale-free and small-world 
characteristics. The average degree was 6.3, but was 
10.5 for gastric cancer-related genes, almost all of which 
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Figure 1. Human signaling network. Light gray lines represent 
the physical interactions, dark black lines represent the inhibited 
interactions, and pink lines represent the activated interactions. 
The dark red nodes are cancer-related genes. 

were hub genes in the network (23). As shown in Figure 
2B, the genes with a higher number of neighbors tended 
to have lower clustering coefficients. 

Human signaling network motif mining 

Biological networks are composed of recurring net- 
work models, and all models are usually combinations of 
motifs with three vertices. We conducted the motif mining 
using the FANMOD software for the gastric cancer- 
related motifs with three vertices. The nodes and edges in 
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Figure 3. Distribution of significant motif difference (SMD) scores 
of motifs marked with gastric cancer-related genes. 

the network were marked in different colors. In the total of 
92 models, 90 were marked with cancer-related genes. Of 
a total of 69,492 motifs, 57,942 were marked with cancer- 
related genes. 

Significantly different motif selection 

SMD scores of 57,942 motifs were computed using 
the gene expression profiles under normal and cancer 
states. In all, 26,354 motifs were selected with all three 
genes expressed, and the distributions of these motif 
scores were normally distributed (Figure 3). The SMD 
scores in the normal and cancer states were significantly 
different for 264 motifs (P<0.05). 

Functional annotations of significantly different 
motifs 

Genes in the significantly different motifs were mainly 
enriched in functions closely related to the occurrence of 
cancer, such as regulation of cell death, regulation of 
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Figure 2. A, Degree distributions of the human signaling network. Numbers of nodes with higher degree were smaller than the other 
nodes, and all nodes approximated a power-law. S, Clustering coefficients distributions of the human signaling network. The average 
clustering coefficient of all nodes was plotted against the numbers of neighbors, and nodes with smaller coefficients tended to have 
fewer neighbors. 
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programmed cell death, protein amino acid phosphoryla- 
tion, and intracellular signaling cascades (Table 1). This 
result confirmed the relationship between the significantly 
different motifs and gastric cancer. 

Type and rank analysis of significantly different 
motifs 

First, we classified the types of significantly different 
motifs, and found that the types having more than five 
motifs were mainly cascades and positive feedback 
(Figure 4). Next, we ranked the motifs according to their 
SMD scores (Table 2), and queried for the relationship of 
genes of the top five motifs with gastric cancer. Among all 
the genes, only two, NCOR2 (human nuclear corepressor 
2) and ARHGEF7 (rho guanine nucleotide exchange 
factor), were found to have no relation to gastric cancer. 
The relationships of EPOR (erythropoietin receptor), 
MAPK14 (mitogen-activated protein kinase 1), BCL2L1 
(BCL2-like1), KRT18 (keratin 18), PTPN6 (protein tyro- 
sine phosphatase nonreceptor 6), CASP3 (caspase-3), 
TGFBR2 (transforming growth factor-beta, TGFp, type II 
receptor), AR (adrenergic receptor), and CASP7 (cas- 
pase-7) with gastric cancer were already known. 

Discussion 

The human signaling network we constructed was 
very large and could reveal additional signal-associated 
information about gastric cancer. Analysis of the topolo- 
gical characteristics of the network revealed that gastric 
cancer-related genes had a higher average degree than 
that of all the genes taken together, and that most of these 
cancer-related genes were hub genes in the network. This 
result further confirmed the importance of cancer-related 



Table 1. Top 15 functions of genes in the significantly different motifs. 



Category 






Term 


FDR 


GOTERM 


_BP_ 


FAT 


GO:0007242-intracellular signaling cascade 


6.64E-25 


GOTERM 


_BP_ 


FAT 


GO:0043067-regulation of programmed cell death 


1.27E-23 


GOTERM 


_BP_ 


FAT 


GO:001 0941 -regulation of cell death 


1.55E-23 


GOTERM 


cc 


.FAT 


GO:0005829-cytosol 


8.74E-23 


GOTERM 


_BP_ 


FAT 


GO:0042981 -regulation of apoptosis 


3.89E-22 


GOTERM 


_BP_ 


FAT 


GO:0010033-response to organic substance 


6.09E-22 


GOTERM 


_BP_ 


FAT 


GO:0010604-positive regulation of macromolecule metabolic process 


1.51E-21 


GOTERM 


_BP_ 


FAT 


GO:0043065-positive regulation of apoptosis 


5.66E-20 


GOTERM 


_BP_ 


FAT 


GO:0043068-positive regulation of programmed cell death 


7.43E-20 


GOTERM 


_BP_ 


FAT 


GO:0010942-positive regulation of cell death 


8.89E-20 


GOTERM 


_BP_ 


FAT 


GO:0006468-protein amino acid phosphorylation 


3.07E-19 


GOTERM 


_BP_ 


FAT 


GO:0007167-enzyme linked receptor protein signaling pathway 


9.14E-19 


GOTERM 


_BP_ 


FAT 


GO:0006796-phosphate metabolic process 


5.50E-18 


GOTERM 


_BP_ 


FAT 


GO:0006793-phosphorus metabolic process 


5.50E-18 


GOTERM 


_BP_ 


FAT 


GO:0031328-positive regulation of cellular biosynthetic process 


6.66E-18 
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Figure 4. Models of significantly different motifs. Red arrows are 
the activated interactions, while green arrows are the inhibited 
interactions. Black nodes represent normal genes, while red nodes 
represent gastric cancer-related genes. 

genes (23). We also conducted cancer-related motif 
mining for a better understanding of the mechanisms of 
cancer occurrence and development. Cascade and 
positive feedback were the two types of motifs with 
significantly different normal and cancer state SMD 
scores, suggesting that they are disrupted in the cancer 
state, which may promote the speed of signal transduc- 
tion. Various types of motifs are associated with cell 
functions. The significance of the cascade type lies in 
its influence on cell proliferation and differentiation, the 
negative feedback type participates in an adaptive 
response, and the positive feedback type can enhance 
signal robustness (24,25). Thus, efficient signal transduc- 
tion may be the reason why cancer cells can proliferate so 
rapidly. 

We mapped gene expression values to the signaling 
network and then screened the significantly different 
motifs according to differences in coexpression of motif 
genes between the normal and cancer states. Expression 
of genes in the selected motifs was mainly enriched in 
those functions implicated with cancer development, such 
as regulation of cell death, regulation of programmed cell 
death, protein amino acid phosphorylation, and intercellular 



BP: biological process; CC: cellular component; GO: gene ontology; FDR: false discovery rate. 
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Table 2. Top 5 motifs ranked by the significant motif difference scores. 



Motif 


Gene 1 


Gene 2 


Gene 3 


Score 


003300001 


EPOR 


PTPN6 


KRT18* 


2.503577 


000000211 


MAPK14* 


NCOR2 


ARHGEF7 


2.40976 


000303001 


PTPN6 


KRT18* 


TGFBR2 


2.390741 


000300011 


MAPK14* 


NCOR2 


AR 


2.3854 


100010231 


BCL2L1* 


CASP3 


CASP7 


2.378815 



'Already known genes. 



signaling cascades. Recently, studies have shown that 
amino acids are not only cell signaling molecules but also 
regulators of gene expression and the protein phosphor- 
ylation cascade (26). The signaling pathways of the cellular 
response to accurate transmission of signals rely on 
protein phosphorylation and, ultimately, lead to the activa- 
tion of specific transcription factors that induce the 
expression of appropriate target genes (27). Extracellular 
signals are transmitted from the cell membrane to genes in 
the nucleus via several communication lines known as 
intracellular signaling pathways, and the transmission of 
signals through these pathways involves sequential phos- 
phorylation events, in many cases by protein kinases, that 
are termed kinase cascades (28). Among signal transduc- 
tion events, protein phosphorylation modulated by protein 
kinases and phosphatases is an important posttransla- 
tional modification event in a variety of cells. Such 
phosphorylation plays a critical function in signal transduc- 
tion, cell growth, differentiation, and oncogenesis (29). All 
the enriched functions in this network were involved in 
cancer development. Thus, the selected motifs were also 
related to gastric cancer. 

EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, 
TGFBR2, AR, and CASP7 were genes in the five motifs 
with the highest SMD scores, and some of them are 
already known to be gastric cancer related. NCOR2 and 
ARHGEF were the only two genes for which there have 
been no reports of a correlation with gastric cancer. 
EPOR is a member of the cytokine receptor superfamily, 
and the increased expression of EPOR is a potential, 
significant prognostic marker in the carcinogenesis, 
angiogenesis, and progression of gastric cancer (30). 
The protein tyrosine phosphatase (PTP) family plays an 



important part in the inhibition or control of growth, and 
members may exert oncogenic functions (31). Several 
studies have detected aberrant DNA methylation of 
the PTPN6 gene in gastric cancer (32,33). TGFBR2, a 
constitutively active kinase, is reported to play a tumor 
suppressor role in the TGFp" pathway in gastric cancer 
(34). Studies have also detected the relevance of AR (35), 
CASP3 (36), and CASP7 (37) with gastric cancer. 
NCOR2, which participates in a corepressor complex 
resulting in chromatin condensation, is involved with many 
cancers (38). It promotes the deacetylation of histone to 
silence genes. In addition, ARHGEF7, also known as 
PAK-interacting exchange factor, participates in the 
activation of Ras family genes (39). Based on these 
identifications, even though there is no direct evidence, 
NCOR2 and ARHGEF may be the latent gastric cancer- 
related genes. 

Gastric cancer is a common, fatal malignancy world- 
wide. At present, therapeutic decisions are based on 
clinical and pathological parameters, including age, 
tumor-involved lymph nodes, metastases, stage, and 
histological grade. Although useful, these factors often 
fail to differentiate more aggressive tumor types from less 
aggressive types (40). As a result, there is an urgent need 
to find specific markers. If motifs, as functional units, can 
be used as biomarkers, then the diagnostic efficiency will 
be greatly increased. We could then find the locations of 
the already known cancer-related genes in a motif, and 
see which genes they affect and which genes affect them. 
The development of cancers might then be suppressed by 
inhibiting the signal transductions of their upstream and 
downstream genes with new potential drugs for gastric 
cancer. 
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